在许多实际应用程序中,顺序规则挖掘(SRM)可以为各种服务提供预测和建议功能。这是模式挖掘的重要技术,可以发现所有属于高频和高信顺序规则的有价值的规则。尽管提出了一些SRM的算法来解决各种实际问题,但没有关于目标顺序规则的研究。有针对性的顺序规则挖掘旨在挖掘用户关注的有趣的顺序规则,从而避免产生其他无效和不必要的规则。这种方法可以进一步提高用户在分析规则和减少数据资源消耗方面的效率。在本文中,我们提供了目标顺序规则的相关定义,并制定了目标顺序规则挖掘的问题。此外,我们提出了一种有效的算法,称为靶向顺序规则挖掘(TASRM)。引入了几种修剪策略和优化,以提高TASRM的效率。最后,在不同的基准测试上进行了大量实验,我们根据其运行时间,内存消耗和可扩展性以及具有不同查询规则的查询情况分析结果。结果表明,与现有的基线算法相比,新型算法TASRM及其变体可以实现更好的实验性能。
translated by 谷歌翻译
公用事业驱动的挖掘是数据科学中的重要任务,在现实生活中有许多应用。高实用程序顺序模式采矿(HUSPM)是一种公用事业驱动挖掘。 HUSPM旨在发现具有高效的所有连续模式。然而,HUSPM的现有算法无法提供处理某些预测或推荐的某些方案的准确概率。提出了高实用序列规则挖掘(HusRM)以发现具有高效用和高信心的所有连续规则。仅针对Husrm提出的一种算法,这是不够高效的。在本文中,我们提出了一种更快的算法,称为美国规则,以有效地挖掘高实用程序顺序规则。它利用规则估计的公用事业共同发生修剪策略(REUCP)来避免无意义的计算。为了提高密集和长序列数据集的效率,提出了四个更严格的上限(LEEU,REEU,LERSU,REREU)及其相应的修剪策略(Leeup,Reeup,Lersup,Rersup)。此外,美国规则提出了规则估计的实用程序重新计算修剪策略(REURP)来处理稀疏数据集。最后,与最先进的算法相比,不同数据集的大量实验表明,在执行时间,内存消耗和可扩展性方面,可以实现更好的性能。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology. However, the existing methods usually require a large computational cost. Meanwhile, the activation function will cause some features of the intermediate layer to be lost. Therefore, it is a challenge to make the model lightweight while reducing the impact of intermediate feature loss on the reconstruction quality. In this paper, we propose a Feature Interaction Weighted Hybrid Network (FIWHN) to alleviate the above problem. Specifically, FIWHN consists of a series of novel Wide-residual Distillation Interaction Blocks (WDIB) as the backbone, where every third WDIBs form a Feature shuffle Weighted Group (FSWG) by mutual information mixing and fusion. In addition, to mitigate the adverse effects of intermediate feature loss on the reconstruction results, we introduced a well-designed Wide Convolutional Residual Weighting (WCRW) and Wide Identical Residual Weighting (WIRW) units in WDIB, and effectively cross-fused features of different finenesses through a Wide-residual Distillation Connection (WRDC) framework and a Self-Calibrating Fusion (SCF) unit. Finally, to complement the global features lacking in the CNN model, we introduced the Transformer into our model and explored a new way of combining the CNN and Transformer. Extensive quantitative and qualitative experiments on low-level and high-level tasks show that our proposed FIWHN can achieve a good balance between performance and efficiency, and is more conducive to downstream tasks to solve problems in low-pixel scenarios.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Motivated by the problem of matching vertices in two correlated Erd\H{o}s-R\'enyi graphs, we study the problem of matching two correlated Gaussian Wigner matrices. We propose an iterative matching algorithm, which succeeds in polynomial time as long as the correlation between the two Gaussian matrices does not vanish. Our result is the first polynomial time algorithm that solves a graph matching type of problem when the correlation is an arbitrarily small constant.
translated by 谷歌翻译
The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译
Consensus clustering aggregates partitions in order to find a better fit by reconciling clustering results from different sources/executions. In practice, there exist noise and outliers in clustering task, which, however, may significantly degrade the performance. To address this issue, we propose a novel algorithm -- robust consensus clustering that can find common ground truth among experts' opinions, which tends to be minimally affected by the bias caused by the outliers. In particular, we formalize the robust consensus clustering problem as a constraint optimization problem, and then derive an effective algorithm upon alternating direction method of multipliers (ADMM) with rigorous convergence guarantee. Our method outperforms the baselines on benchmarks. We apply the proposed method to the real-world advertising campaign segmentation and forecasting tasks using the proposed consensus clustering results based on the similarity computed via Kolmogorov-Smirnov Statistics. The accurate clustering result is helpful for building the advertiser profiles so as to perform the forecasting.
translated by 谷歌翻译
Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.
translated by 谷歌翻译